Skip to content

feat: vulnerability scanning within git integration (IN-956)#3892

Open
epipav wants to merge 2 commits intomainfrom
feat/git-osv-vulnerabilities
Open

feat: vulnerability scanning within git integration (IN-956)#3892
epipav wants to merge 2 commits intomainfrom
feat/git-osv-vulnerabilities

Conversation

@epipav
Copy link
Collaborator

@epipav epipav commented Mar 3, 2026

Adds automated vulnerability scanning for all git repositories using the Google OSV Scanner SDK. Runs on the first clone batch per repo and persists results directly to the insights database.

Architecture
Go binary wrapped in Python — OSV Scanner is a Go library with no Python bindings. We embed it as an SDK dependency and call it programmatically, following the same subprocess + JSON stdout pattern as the software-value service.

The binary exits with code 0 and communicates errors through the JSON payload, so the Python subprocess machinery never misinterprets a non-zero exit as a crash.

Design decisions
Vulnerability identity: (repo_url, vulnerability_id, package_name, source_path) — same CVE can appear in multiple packages and lockfiles

ID classification: primary ID + aliases sorted into cve_ids, ghsa_ids, other_ids arrays by prefix

Severity: derived from CVSS numeric score using standard thresholds (CRITICAL/HIGH/MEDIUM/LOW)

Status tracking: OPEN (no fix known), FIX_AVAILABLE (patch exists), RESOLVED (no longer detected)

Database strategy: upsert + mark-resolved (not delete + insert) — preserves full history of when vulnerabilities were first detected, last seen, and resolved

Transitive scanning: resolves full dependency graph by default; falls back to direct-only on timeout (3min) for first scans; subsequent scans reuse the previous mode

OOM handling: on any scanner crash, marks stale running scan records as failure; on OOM specifically (SIGKILL), retries with --no-transitive to skip the most memory-intensive part

Scan tracking: every invocation creates a vulnerability_scans row (running → success/failure/no_packages_found) with duration, counts, and errors


Note

Medium Risk
Adds a new repo-processing stage that shells out to a new Go scanner binary and writes/upserts vulnerability data into the insights DB, plus new analytics datasources/pipes; failures/timeouts/OOM handling could impact worker throughput and DB load.

Overview
Adds automated vulnerability scanning to git integration. The repository worker now runs a new VulnerabilityScannerService on the first clone batch, tracking execution via OperationType.VULNERABILITY_SCAN.

A new Go binary (vulnerability-scanner) is built into the git-integration Docker image and invoked via run_shell_command (now supports real-time stderr streaming and propagates subprocess returncode). The scanner creates/finalizes vulnerability_scans, upserts vulnerabilities with a resolve+upsert strategy, supports transitive dependency scanning with a --no-transitive fallback on timeout/OOM, and reads new INSIGHTS_DB_* env vars.

Adds Tinybird vulnerabilities/vulnerability_scans datasources and summary/list/breakdown pipes to query vulnerability counts by severity/ecosystem and last scan status.

Written by Cursor Bugbot for commit 0ea8a21. This will update automatically on new commits. Configure here.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav epipav marked this pull request as draft March 3, 2026 16:44
@epipav epipav marked this pull request as ready for review March 5, 2026 14:48
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav epipav requested a review from mbani01 March 5, 2026 14:52
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav
Copy link
Collaborator Author

epipav commented Mar 5, 2026

related: linuxfoundation/insights#1725

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

2 similar comments
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@epipav epipav changed the title feat: vulnerability scanning within git integration feat: vulnerability scanning within git integration (IN-956) Mar 5, 2026
Comment on lines +102 to +109
conn = await asyncpg.connect(
user=os.environ["INSIGHTS_DB_USERNAME"],
password=os.environ["INSIGHTS_DB_PASSWORD"],
database=os.environ["INSIGHTS_DB_DATABASE"],
host=os.environ["INSIGHTS_DB_WRITE_HOST"],
port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")),
)
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of connection errors, it won't be caught, shouldn't we include it in the try block ?

Suggested change
conn = await asyncpg.connect(
user=os.environ["INSIGHTS_DB_USERNAME"],
password=os.environ["INSIGHTS_DB_PASSWORD"],
database=os.environ["INSIGHTS_DB_DATABASE"],
host=os.environ["INSIGHTS_DB_WRITE_HOST"],
port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")),
)
try:
try:
conn = await asyncpg.connect(
user=os.environ["INSIGHTS_DB_USERNAME"],
password=os.environ["INSIGHTS_DB_PASSWORD"],
database=os.environ["INSIGHTS_DB_DATABASE"],
host=os.environ["INSIGHTS_DB_WRITE_HOST"],
port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")),
)

mbani01
mbani01 previously approved these changes Mar 6, 2026
Copilot AI review requested due to automatic review settings March 18, 2026 09:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an automated vulnerability-scanning step to the git integration worker, implemented as a Go OSV-Scanner-based binary invoked from Python, with results persisted to the insights DB.

Changes:

  • Run a new VulnerabilityScannerService on the first clone batch and record an execution via OperationType.VULNERABILITY_SCAN.
  • Introduce a new Go-based vulnerability-scanner module/binary (OSV Scanner SDK) plus Docker build plumbing.
  • Extend run_shell_command to propagate return codes and optionally stream stderr.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
services/apps/git_integration/src/crowdgit/worker/repository_worker.py Invokes vulnerability scan on first clone batch.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py Python wrapper for scanner subprocess + execution tracking + stale scan cleanup.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner.go Core Go scanning logic + normalization + DB persistence.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/types.go Shared response / DB model types for scanner.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/main.go CLI entrypoint + JSON stdout formatting.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/go.mod Go module definition for scanner.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/go.sum Go dependency lockfile for scanner.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/db.go Insights DB connection + upsert/resolve strategy + scan tracking.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/config.go Reads target path + insights DB env configuration.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/README.md Design/behavior documentation for scanner component.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/.gitignore Ignores local Go build artifacts.
services/apps/git_integration/src/crowdgit/services/utils.py Adds stderr streaming + returncode propagation in run_shell_command.
services/apps/git_integration/src/crowdgit/services/init.py Exposes VulnerabilityScannerService.
services/apps/git_integration/src/crowdgit/server.py Wires scanner service into app lifecycle / worker init.
services/apps/git_integration/src/crowdgit/errors.py Adds returncode field to CommandExecutionError.
services/apps/git_integration/src/crowdgit/enums.py Adds OperationType.VULNERABILITY_SCAN.
scripts/services/docker/Dockerfile.git_integration Builds + ships vulnerability-scanner binary in the image.
backend/.env.dist.local Adds local insights DB env vars.
backend/.env.dist.composed Adds composed insights DB host env var.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +102 to +107
conn = await asyncpg.connect(
user=os.environ["INSIGHTS_DB_USERNAME"],
password=os.environ["INSIGHTS_DB_PASSWORD"],
database=os.environ["INSIGHTS_DB_DATABASE"],
host=os.environ["INSIGHTS_DB_WRITE_HOST"],
port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")),
Comment on lines +38 to +52
config.InsightsDatabase.User = os.Getenv("INSIGHTS_DB_USERNAME")
config.InsightsDatabase.Password = os.Getenv("INSIGHTS_DB_PASSWORD")
config.InsightsDatabase.DBName = os.Getenv("INSIGHTS_DB_DATABASE")
config.InsightsDatabase.Host = os.Getenv("INSIGHTS_DB_WRITE_HOST")
if portStr := os.Getenv("INSIGHTS_DB_PORT"); portStr != "" {
if port, err := strconv.Atoi(portStr); err == nil {
config.InsightsDatabase.Port = port
}
}
config.InsightsDatabase.SSLMode = os.Getenv("INSIGHTS_DB_SSLMODE")
if poolMaxStr := os.Getenv("INSIGHTS_DB_POOL_MAX"); poolMaxStr != "" {
if poolMax, err := strconv.Atoi(poolMaxStr); err == nil {
config.InsightsDatabase.PoolMax = poolMax
}
}
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@epipav epipav force-pushed the feat/git-osv-vulnerabilities branch from 5f955ac to 9b7b58e Compare March 19, 2026 12:37
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 6 total unresolved issues (including 4 from previous reviews).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Signed-off-by: anil <epipav@gmail.com>
@epipav epipav force-pushed the feat/git-osv-vulnerabilities branch from 9b7b58e to 706ee5c Compare March 19, 2026 12:54
Signed-off-by: anil <epipav@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants